Interpretability and Explainability in Machine Learning

We start by loading the dataset and generating a summary to better understand the data we will be working with.

concrete <- as.data.frame(read_excel("Concrete_Data.xls"))
DescVars <- names(concrete)
names(concrete) <- c("Cement","Slag","FlyAsh","Water","Superplast","CoarseAggr","FineAggr","Age","Strength")

summary(concrete)
##      Cement           Slag           FlyAsh           Water      
##  Min.   :102.0   Min.   :  0.0   Min.   :  0.00   Min.   :121.8  
##  1st Qu.:192.4   1st Qu.:  0.0   1st Qu.:  0.00   1st Qu.:164.9  
##  Median :272.9   Median : 22.0   Median :  0.00   Median :185.0  
##  Mean   :281.2   Mean   : 73.9   Mean   : 54.19   Mean   :181.6  
##  3rd Qu.:350.0   3rd Qu.:142.9   3rd Qu.:118.27   3rd Qu.:192.0  
##  Max.   :540.0   Max.   :359.4   Max.   :200.10   Max.   :247.0  
##    Superplast       CoarseAggr        FineAggr          Age        
##  Min.   : 0.000   Min.   : 801.0   Min.   :594.0   Min.   :  1.00  
##  1st Qu.: 0.000   1st Qu.: 932.0   1st Qu.:731.0   1st Qu.:  7.00  
##  Median : 6.350   Median : 968.0   Median :779.5   Median : 28.00  
##  Mean   : 6.203   Mean   : 972.9   Mean   :773.6   Mean   : 45.66  
##  3rd Qu.:10.160   3rd Qu.:1029.4   3rd Qu.:824.0   3rd Qu.: 56.00  
##  Max.   :32.200   Max.   :1145.0   Max.   :992.6   Max.   :365.00  
##     Strength     
##  Min.   : 2.332  
##  1st Qu.:23.707  
##  Median :34.443  
##  Mean   :35.818  
##  3rd Qu.:46.136  
##  Max.   :82.599

Next, we split the dataset into 700 samples for training, with the remaining samples allocated for testing.

set.seed(123)

train_samples<- sample(nrow(concrete), 700)

train <- concrete[train_samples, ]
test <- concrete[-train_samples, ]

1. Fit a Random Forest

We will fit a Random Forest model to determine the importance of each variable and its contribution to the response variable.

Variable Importance by reduction of impurity

A split occurs at a decision node of a tree and is chosen to maximize some impurity reduction metric. The splitting criteria is:

\[C(T) - C(T') = N_rQ_r - (N_{r'}Q_{r'} + N_{r''}Q_{r''})\]

Where \(Q_r\) represents the impurity measure (in this case, the variance of the response variable) at node \(r\). This implies that if \(C(T) - C(T')\) is greater than zero, the split is considered justified, as it indicates a reduction in impurity (variance) after the split.

The importance of a variable in a Random Forest is derived from how much it contributes to reducing impurity across all trees in the forest.

The resulting scores represent the variable importance and indicate how much each feature contributed to the model’s performance.

rf_imp <- ranger(formula = Strength ~ ., data = train, importance='impurity')

Variable Importance by out-of-bag random permutations

This approach evaluates the predictive power of each feature by comparing model performance before and after permuting the values of that feature.

Features that significantly degrade performance when permuted are more important because the model relies heavily on them for predictions.

rf_perm <- ranger(formula = Strength ~ ., data = train, importance='permutation')

Graphical comparison of both Variable Importance measures

rf_imp_vip <- vip(rf_imp, num_features = 8)
rf_perm_vip <- vip(rf_perm, num_features = 8)
grid.arrange(rf_imp_vip, rf_perm_vip, ncol=2, top="Left: Reduction in impurity at splits. Right: Out-of-bag permutations")

Results look very similar for both models. The order of the variables importance is almost the same, the only variation is in the order between the variables Slag and FineAggr, and FlyAsh and CoarseAggr, that are reversed between both models.

Variable Importance of each variable by Shapley Values

Now we will compute the variable importance using Shapley Values.

rf_imp_shapley <- vip(rf_imp, method="shap",
                    pred_wrapper=yhat, num_features = 8,
                    newdata=test[,-9], train=train)

rf_perm_shapley <- vip(rf_perm, method="shap",
                    pred_wrapper=yhat, num_features = 8,
                    newdata=test[,-9], train=train)

grid.arrange(rf_imp_vip, rf_perm_vip, rf_imp_shapley, rf_perm_shapley,
             ncol=2, nrow=2, top="Top left: Impurity. Top right: OOB permutations. \n Bottom left: Shapley values for impurity. Bottom right: Shapley values for OOB permutations")

Looking at the Shapley Values, the comparison is almost the same, but the order is reversed between FlyAsh and CoarseAgg as in the previous step. Although, in this case, the variables Slag and FineAggr follow the same order.

2. Fit a linear model and a gam model

We will follow the same approach by fitting a Linear Model and a Generalized Additive Model to analyze the contributions of the variables.

lm_model <- lm(Strength ~ ., data = train)

summary(lm_model)
## 
## Call:
## lm(formula = Strength ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.934  -5.998   0.602   6.881  34.880 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.163227  30.941577  -0.716    0.474    
## Cement        0.113761   0.010020  11.353  < 2e-16 ***
## Slag          0.097243   0.011870   8.193 1.24e-15 ***
## FlyAsh        0.073444   0.014957   4.911 1.13e-06 ***
## Water        -0.120030   0.046478  -2.583    0.010 *  
## Superplast    0.517791   0.116083   4.461 9.54e-06 ***
## CoarseAggr    0.018545   0.010992   1.687    0.092 .  
## FineAggr      0.013261   0.012532   1.058    0.290    
## Age           0.121099   0.006842  17.700  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 691 degrees of freedom
## Multiple R-squared:  0.6252, Adjusted R-squared:  0.6209 
## F-statistic: 144.1 on 8 and 691 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm_model)

The residuals seem normally distributed, but not homoscedastic. This indicates that applying the logarithm to Strength might improve R-sq.

lm_log <- lm(log(Strength) ~ ., data = train)

summary(lm_log)
## 
## Call:
## lm(formula = log(Strength) ~ ., data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.51322 -0.22062  0.09024  0.26409  0.74334 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.7297821  1.1215704   1.542 0.123462    
## Cement       0.0036110  0.0003632   9.942  < 2e-16 ***
## Slag         0.0028969  0.0004303   6.733 3.50e-11 ***
## FlyAsh       0.0027693  0.0005421   5.108 4.21e-07 ***
## Water       -0.0030306  0.0016847  -1.799 0.072477 .  
## Superplast   0.0163411  0.0042078   3.884 0.000113 ***
## CoarseAggr   0.0004923  0.0003984   1.236 0.217029    
## FineAggr     0.0001757  0.0004543   0.387 0.699012    
## Age          0.0041021  0.0002480  16.540  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3672 on 691 degrees of freedom
## Multiple R-squared:  0.5628, Adjusted R-squared:  0.5578 
## F-statistic: 111.2 on 8 and 691 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm_log)

Applying the logarithm to the response variable reduces the variance explained by the model, so we do undo the transformation.

gam_model <- gam(Strength ~ s(Age) + s(Cement) + s(Slag) + s(FlyAsh)
                 + s(Superplast) + s(CoarseAggr) + s(FineAggr) + s(Water)
                 , data = train)

summary(gam_model)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Strength ~ s(Age) + s(Cement) + s(Slag) + s(FlyAsh) + s(Superplast) + 
##     s(CoarseAggr) + s(FineAggr) + s(Water)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.5892     0.2062   172.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df       F  p-value    
## s(Age)        8.400  8.810 250.570  < 2e-16 ***
## s(Cement)     7.934  8.692  38.512  < 2e-16 ***
## s(Slag)       5.958  7.063  23.305  < 2e-16 ***
## s(FlyAsh)     7.503  8.368   6.926  < 2e-16 ***
## s(Superplast) 6.869  7.889   4.166 7.63e-05 ***
## s(CoarseAggr) 1.000  1.000   0.458    0.499    
## s(FineAggr)   8.572  8.935   9.130  < 2e-16 ***
## s(Water)      8.512  8.916  17.078  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.89   Deviance explained = 89.9%
## GCV =  32.35  Scale est. = 29.774    n = 700
par(mfrow=c(2,2))
plot(gam_model)

gam.check(gam_model)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 24 iterations.
## The RMS GCV score gradient at convergence was 5.116767e-06 .
## The Hessian was positive definite.
## Model rank =  73 / 73 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                 k'  edf k-index p-value    
## s(Age)        9.00 8.40    0.97   0.190    
## s(Cement)     9.00 7.93    0.89  <2e-16 ***
## s(Slag)       9.00 5.96    0.93   0.030 *  
## s(FlyAsh)     9.00 7.50    0.91   0.005 ** 
## s(Superplast) 9.00 6.87    0.92   0.015 *  
## s(CoarseAggr) 9.00 1.00    0.84  <2e-16 ***
## s(FineAggr)   9.00 8.57    0.82  <2e-16 ***
## s(Water)      9.00 8.51    0.87  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residuals are normally distributed and homoscedastic. However, hypothesis test shows that the number of knots to estimate s(Age) might be too low, so we increase it.

gam_model2 <- gam(Strength ~ s(Age, k=14) + s(Cement) + s(Slag) + s(FlyAsh)
                 + s(Superplast) + s(CoarseAggr) + s(FineAggr) + s(Water)
                 , data = train)
gam.check(gam_model2)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 23 iterations.
## The RMS GCV score gradient at convergence was 1.404314e-05 .
## The Hessian was positive definite.
## Model rank =  75 / 77 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                  k'   edf k-index p-value    
## s(Age)        13.00  9.77    0.98   0.245    
## s(Cement)      9.00  7.96    0.89  <2e-16 ***
## s(Slag)        9.00  6.26    0.93   0.040 *  
## s(FlyAsh)      9.00  7.47    0.90  <2e-16 ***
## s(Superplast)  9.00  6.86    0.92   0.015 *  
## s(CoarseAggr)  9.00  1.00    0.84  <2e-16 ***
## s(FineAggr)    9.00  8.57    0.82  <2e-16 ***
## s(Water)       9.00  8.60    0.86  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(gam_model2)

The model has not improved and we cannot increase k any further, since Age only takes 14 distinct values in the train dataset.

Overall, we keep the original full LM and GAM, since they will allow us to compare variable importance across them and the fitted random forest.

Summarize the fitted models

We will summarize, again numerically and graphically, the fitted models.

summary(lm_model)
## 
## Call:
## lm(formula = Strength ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.934  -5.998   0.602   6.881  34.880 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.163227  30.941577  -0.716    0.474    
## Cement        0.113761   0.010020  11.353  < 2e-16 ***
## Slag          0.097243   0.011870   8.193 1.24e-15 ***
## FlyAsh        0.073444   0.014957   4.911 1.13e-06 ***
## Water        -0.120030   0.046478  -2.583    0.010 *  
## Superplast    0.517791   0.116083   4.461 9.54e-06 ***
## CoarseAggr    0.018545   0.010992   1.687    0.092 .  
## FineAggr      0.013261   0.012532   1.058    0.290    
## Age           0.121099   0.006842  17.700  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 691 degrees of freedom
## Multiple R-squared:  0.6252, Adjusted R-squared:  0.6209 
## F-statistic: 144.1 on 8 and 691 DF,  p-value: < 2.2e-16
summary(gam_model)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Strength ~ s(Age) + s(Cement) + s(Slag) + s(FlyAsh) + s(Superplast) + 
##     s(CoarseAggr) + s(FineAggr) + s(Water)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.5892     0.2062   172.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df       F  p-value    
## s(Age)        8.400  8.810 250.570  < 2e-16 ***
## s(Cement)     7.934  8.692  38.512  < 2e-16 ***
## s(Slag)       5.958  7.063  23.305  < 2e-16 ***
## s(FlyAsh)     7.503  8.368   6.926  < 2e-16 ***
## s(Superplast) 6.869  7.889   4.166 7.63e-05 ***
## s(CoarseAggr) 1.000  1.000   0.458    0.499    
## s(FineAggr)   8.572  8.935   9.130  < 2e-16 ***
## s(Water)      8.512  8.916  17.078  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.89   Deviance explained = 89.9%
## GCV =  32.35  Scale est. = 29.774    n = 700
par(mfrow=c(2,2))
plot(lm_model)

plot(gam_model)

par(mfrow=c(1,1))

Variable Importance by Shappley values in the linear and gam fitted models

lm_shapley <- vip(lm_model, 
                  method="shap",
                  pred_wrapper=predict.lm, 
                  num_features = 8,
                  newdata=test[,-9],
                  exact=TRUE, train=train) 
gam_shapley <- vip(gam_model, 
                   method="shap",
                   pred_wrapper=predict.gam, 
                   num_features = 8,
                   newdata=test[,-9],
                   exact=TRUE, train=train) 

grid.arrange(lm_shapley, gam_shapley, ncol=2, top="Left: Shapley values of linear model. \n Right: Shapley values of Generalized additive model")

The Shapley values for the Linear Model (LM) and the Generalized Additive Model (GAM) yield different insights compared to the Random Forests. In the LM and GAM, the Cement variable stands out as the most significant predictor, whereas in the Random Forests, Age plays a more prominent role. However, across all models, Cement is highlighted as the first or second most important variable, whereas the importance of the other variables can vary a little depending on the model.

3.

source("relev.ghost.var.R")
Rel_Gh_Var_lm <- relev.ghost.var(model= lm_model, 
                              newdata = test[,-9],
                              y.ts = test$Strength,
                              func.model.ghost.var = lm
)
par(cex.main = 1.7)  # Increase eivenvalues plot's title size
plot.relev.ghost.var(Rel_Gh_Var_lm,n1=700,ncols.plot = 3)

aux <- cbind(Rel_Gh_Var_lm$relev.ghost,lm_shapley$data$Importance)
plot(aux[,1],aux[,2],col=0,xlab="Relev. by Ghost Variables",ylab="Shapley Var. Imp.")
text(aux[,1],aux[,2],row.names(aux))

From the plot, we can observe that, using ghost variable importance, age is the most relevant explanatory variable, followed by cement and slag. These first two components account for a significant portion of the variability in the data, with the first and second eigenvalues explaining 64% and 30% of the variability, respectively. In the first component, most of the relevance is attributed to age, while in the second component, cement and slag dominate.

When comparing these findings with the Shapley values, we notice a shift in relevance: age, which was previously the least relevant variable, is now the most important. Cement and slag remain relevant but with slightly reduced importance compared to the Shapley values. The relevance of all other variables appears to remain relatively consistent between the two models.

Rel_Gh_Var_gam <- relev.ghost.var(model= gam_model, 
                              newdata = test[-9],
                              y.ts = test$Strength,
                              func.model.ghost.var = lm
)
par(cex.main = 1.7)  # Increase eivenvalues plot's title size
plot.relev.ghost.var(Rel_Gh_Var_gam,n1=700,ncols.plot = 3)

aux <- cbind(Rel_Gh_Var_gam$relev.ghost,gam_shapley$data$Importance)
plot(aux[,1],aux[,2],col=0,xlab="Relev. by Ghost Variables",ylab="Shapley Var. Imp.")
text(aux[,1],aux[,2],row.names(aux))

From the plot, we can see that, using ghost variable importance, age remains the most relevant explanatory variable. The first component explains 65% of the variability, with the majority of its relevance attributed to age. The second component accounts for 20% of the variability, with relevance primarily attributed to Cement and FlyAsh, and slightly less to Slag.

When comparing this with the Shapley values, age remains the most significant variable. However, FlyAsh appears to have more importance in the ghost variable analysis than Cement and Slag, which, in contrast, are attributed more importance by the Shapley values.

rf_concrete = randomForest(Strength ~ ., data=train)


Rel_Gh_Var_rf <- relev.ghost.var(model=rf_concrete, 
                              newdata = test[,-9],
                              y.ts = test$Strength,
                              func.model.ghost.var = lm
)
par(cex.main = 1.7)  # Increase eivenvalues plot's title size
plot.relev.ghost.var(Rel_Gh_Var_rf,n1=700,ncols.plot = 3)

aux <- cbind(Rel_Gh_Var_rf$relev.ghost,rf_imp_shapley$data$Importance)
plot(aux[,1],aux[,2],col=0,xlab="Relev. by Ghost Variables",ylab="Shapley Var. Imp.")
text(aux[,1],aux[,2],row.names(aux))

Age seems to be the most important variable, the first component explain 88% of the variability and nearly all the singnificant is brought by age. Second component explain nearly 7% of the variability and main importance are attributed to Cement and Slag Compared to Shapley values, age is the most relevant, whereas in the latter is the at the bottom of the importance ranking, Cement and Slag seems to be important in both methods

4.

Create random forest explainer.

explainer_rf <- explain.default(model = rf_imp,  
                               data = test[,-9],
                               y = test$Strength, 
                               label = "Random Forest")
## Preparation of a new explainer is initiated
##   -> model label       :  Random Forest 
##   -> data              :  330  rows  8  cols 
##   -> target variable   :  330  values 
##   -> predict function  :  yhat.ranger  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package ranger , ver. 0.17.0 , task regression (  default  ) 
##   -> predicted values  :  numerical, min =  8.363111 , mean =  36.30909 , max =  75.63867  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -20.31393 , mean =  -0.006253068 , max =  25.08166  
##   A new explainer has been created!

a) Variable Importance by Random Permutations

Rnd_Perm <- model_parts(
  explainer_rf,
  N = NULL, # All available data are used
  B = 10   # number of permutations to be used, with B = 10 used by default
)

Rnd_Perm
##        variable mean_dropout_loss         label
## 1  _full_model_          5.934092 Random Forest
## 2        FlyAsh          6.659445 Random Forest
## 3    CoarseAggr          6.667983 Random Forest
## 4      FineAggr          7.148863 Random Forest
## 5          Slag          7.276549 Random Forest
## 6    Superplast          7.997304 Random Forest
## 7         Water          9.310147 Random Forest
## 8        Cement         11.550053 Random Forest
## 9           Age         14.198205 Random Forest
## 10   _baseline_         22.010103 Random Forest
plot(Rnd_Perm)

aux.plot <- plot(Rnd_Perm)
dropout_loss.y <- Rnd_Perm$dropout_loss[1]
aux.I <- order(-aux.plot$data$dropout_loss.x)
rf_perm_DALEX_as_vi <- tibble::tibble(aux.plot$data[aux.I,c(2,4)])
class(rf_perm_DALEX_as_vi) <- c("vi", class(rf_perm_DALEX_as_vi))
names(rf_perm_DALEX_as_vi) <- c("Variable", "Importance")
rf_perm_DALEX_as_vi$Importance <- 
  rf_perm_DALEX_as_vi$Importance - dropout_loss.y

# Creating the ggplot: 
rf_perm_DALEX_vip <- vip(rf_perm_DALEX_as_vi)

grid.arrange(rf_imp_vip, rf_perm_vip,
             rf_perm_DALEX_vip, ncol=2, nrow=2,
             top="Top left: Impurity. Top right: oob permutations. Bottom left: test sample permutations"
             )

All three feature plots produce results that are consistent with one another, with the test sample permutation showing a closer resemblance to the OOB permutation.

b) Partial Dependence Plot for each explanatory variable

PDP_rf <- model_profile(
  explainer=explainer_rf,
  variables = NULL,  # All variables are used
  N = NULL, # All available data are used
  groups = NULL,
  k = NULL,
  center = TRUE,
  type = "partial" #  partial, conditional or accumulated
)

plot(PDP_rf, facet_ncol=2)

c) Local (or Conditional) Dependence Plot for each explanatory variable

CDP_rf <- model_profile(
  explainer=explainer_rf,
  variables = NULL,  # All variables are used
  N = NULL, # All available data are used
  groups = NULL,
  k = NULL,
  center = TRUE,
  type = "conditional" #  partial, conditional or accumulated
)

plot(CDP_rf, facet_ncol=2)

Conditional dependence is similar to partial dependence but reveals some nuanced differences. For instance, Superplast exhibits a sharper increasing dependency on the target variable, indicating its significant contribution at higher values. Similarly, CoarseAggr demonstrates a more pronounced decreasing dependency, implying a stronger negative impact as its value increases. Lastly, the relationship between Cement and the target variable appears to be more linear, suggesting a consistent influence across its range.

5.

Now, we are interested in explaining the weakest and strongest concretes in the test dataset.

local_low <- test[which.min(test$Strength), ]
local_high <- test[which.max(test$Strength), ]

a) SHAP

shap_low  <- predict_parts(explainer=explainer_rf
                           , new_observation=local_low
                           ,  type="shap")
shap_high <- predict_parts(explainer=explainer_rf
                           , new_observation=local_high
                           , type="shap")

shap_low
##                                           min          q1      median
## Random Forest: Age = 3            -13.3202610 -12.4934376 -11.9042256
## Random Forest: Cement = 108.3      -9.9072352  -8.4276250  -8.0929361
## Random Forest: CoarseAggr = 938.2  -0.7558079  -0.6006756  -0.0982932
## Random Forest: FineAggr = 849      -3.1922013  -2.3517140  -2.1107560
## Random Forest: FlyAsh = 0          -0.7713527  -0.2302253   0.1858270
## Random Forest: Slag = 162.4        -1.2426721   0.5253203   0.8709904
## Random Forest: Superplast = 0      -5.0589186  -3.2178575  -2.6077533
## Random Forest: Water = 203.5       -5.3702437  -4.8750094  -4.1984289
##                                          mean          q3        max
## Random Forest: Age = 3            -11.8290053 -11.4294416 -9.6588984
## Random Forest: Cement = 108.3      -7.9283065  -7.4237452 -6.0221260
## Random Forest: CoarseAggr = 938.2  -0.1580071   0.1208928  0.4116251
## Random Forest: FineAggr = 849      -2.1606687  -1.9265369 -1.6104835
## Random Forest: FlyAsh = 0           0.2219229   0.9014332  1.2210993
## Random Forest: Slag = 162.4         0.9301561   1.8981939  2.2844444
## Random Forest: Superplast = 0      -2.8316599  -2.0377832 -1.3882451
## Random Forest: Water = 203.5       -4.1904128  -3.7691978 -2.2508298
plot(shap_low)

shap_high
##                                         min        q1   median     mean
## Random Forest: Age = 91           7.2574873 8.7627443 9.310507 9.201861
## Random Forest: Cement = 389.9     4.9079839 6.3844236 7.192446 7.401717
## Random Forest: CoarseAggr = 944.7 0.4968657 0.7495974 1.271960 1.560704
## Random Forest: FineAggr = 755.8   1.0275997 1.4705977 1.760566 1.870738
## Random Forest: FlyAsh = 0         0.9630022 1.3074026 1.735142 1.875402
## Random Forest: Slag = 189         2.7208130 2.9332063 3.513683 4.103456
## Random Forest: Superplast = 22    3.2328063 3.9284667 4.676292 4.805212
## Random Forest: Water = 145.9      5.8082881 6.4046618 6.639292 6.618314
##                                         q3       max
## Random Forest: Age = 91           9.889640 10.597548
## Random Forest: Cement = 389.9     8.945368  9.582976
## Random Forest: CoarseAggr = 944.7 2.154907  3.466131
## Random Forest: FineAggr = 755.8   2.308660  2.881110
## Random Forest: FlyAsh = 0         2.273406  2.956602
## Random Forest: Slag = 189         5.118516  6.583485
## Random Forest: Superplast = 22    5.507521  6.869975
## Random Forest: Water = 145.9      6.856756  7.346905
plot(shap_high)

According to SHAP, the weakest concrete in the test dataset is so because of the following reasons, in decreasing relevance: it is young, has: little cement, a lot of water, little superplast and a lot of FineAggr.

On the other hand, all concrete features strengthen the strongest concrete in the dataset. The most important ones are: Age (old), Cement (large), Water (little), Superplast (large), Slag (large). However, the importance of some of these features has a big standard error, namely Cement, Superplast and Slag, so their relevance order might actually be lower or larger.

b) Break-down

bd_low  <- predict_parts(explainer_rf
                         , new_observation=local_low
                         ,  type="break_down")
bd_high <- predict_parts(explainer_rf
                         , new_observation=local_high
                         , type="break_down")

bd_low
##                                   contribution
## Random Forest: intercept                36.309
## Random Forest: Age = 3                 -12.408
## Random Forest: Cement = 108.3           -5.679
## Random Forest: Water = 203.5            -2.401
## Random Forest: Slag = 162.4             -0.040
## Random Forest: Superplast = 0           -3.960
## Random Forest: FineAggr = 849           -2.077
## Random Forest: FlyAsh = 0               -0.719
## Random Forest: CoarseAggr = 938.2       -0.663
## Random Forest: prediction                8.363
plot(bd_low)

bd_high
##                                   contribution
## Random Forest: intercept                36.309
## Random Forest: Age = 91                  9.373
## Random Forest: Water = 145.9             6.118
## Random Forest: Cement = 389.9            4.908
## Random Forest: Superplast = 22           3.641
## Random Forest: Slag = 189                4.995
## Random Forest: FineAggr = 755.8          2.192
## Random Forest: FlyAsh = 0                2.746
## Random Forest: CoarseAggr = 944.7        3.466
## Random Forest: prediction               73.746
plot(bd_high)

The break-down plot for the weakest concrete disagrees with the SHAP estimators with the importance of Slag and FlyAsh, since this plot shows that they make this concrete weaker, even if little (-0.129 MPa and -0.706 MPa, respectively). We can see how the total contribution of all predictive variables to the response reduce the observation’s Strength from the 35.548 intercept to the 8.363 prediction.

As for the strongest concrete, the break-down plot’s interpretation is also similar to SHAP’s, but some features importance change. For example, now Water is more important than Cement, Superplast is less relevant than Slag, CoarseAggr contributes more than FineAggr…

Overall, we should be more confident on SHAP’s results because break-down’s depend on the order of the explanatory variables, which is a clear downside of this Local IML method.

c) LIME

lime_low  <- predict_surrogate(explainer_rf
                               , new_observation=local_low
                               ,  type="localModel")
lime_high <- predict_surrogate(explainer_rf
                               , new_observation=local_high
                               , type="localModel")

lime_low
##     estimated         variable original_variable dev_ratio response
## 1  36.3090920     (Model mean)                    0.485556         
## 2  45.6673191      (Intercept)                    0.485556         
## 3 -11.4028739 Cement <= 238.27            Cement  0.485556         
## 4   0.0502877  FlyAsh <= 94.11            FlyAsh  0.485556         
## 5  -5.6135248   Water > 181.88             Water  0.485556         
## 6  -1.5576940  Superplast <= 6        Superplast  0.485556         
##   predicted_value         model
## 1        8.363111 Random Forest
## 2        8.363111 Random Forest
## 3        8.363111 Random Forest
## 4        8.363111 Random Forest
## 5        8.363111 Random Forest
## 6        8.363111 Random Forest
plot(lime_low)

lime_high
##    estimated        variable original_variable dev_ratio response
## 1 36.3090920    (Model mean)                   0.3957745         
## 2 33.4424892     (Intercept)                   0.3957745         
## 3  9.1400776    Cement > 350            Cement 0.3957745         
## 4  1.8299698   Slag > 161.49              Slag 0.3957745         
## 5  0.9045521    FlyAsh <= 94            FlyAsh 0.3957745         
## 6  5.9824659 Water <= 166.03             Water 0.3957745         
##   predicted_value         model
## 1         73.7465 Random Forest
## 2         73.7465 Random Forest
## 3         73.7465 Random Forest
## 4         73.7465 Random Forest
## 5         73.7465 Random Forest
## 6         73.7465 Random Forest
plot(lime_high)

LIME’s explanation of the weakest concrete coincides with SHAP’s except for Age. Specifically, the method has identified four properties of this observation that make it the weakest one: Cement <= 238.14, Water > 181.1, FineAggr > 755.8, Superplast <= 5.74 (in decreasing relevance order).

In the strongest concrete, LIME’s results do not mention Age’s paper neither. Furthermore, CoarseAggr, even if not very relevant in SHAP’s nor Break-down’s results, does not appear in LIME’s. Again, it shows that this concrete’s strength is due to having: a lot of cement, superplast and slag, litte water and a lot, but not too much, FineAggr.

d) ICE / ceteris paribus

cp_low  <- predict_profile(explainer_rf, new_observation=local_low)
cp_high <- predict_profile(explainer_rf, new_observation=local_high)

plot(cp_low,  facet_ncol=2)

plot(cp_high, facet_ncol=2)

With ICE, we can understand better why the weakest concrete is so. We can see that, ceteris paribus, strong concretes are at least 100 days old and that, before then, strength is acquired very quickly. Hence, ceteris paribus, a concrete 3 days old is very weak. The amount of cement also seems to grow linearly (from now on, all statements regarding the effect of predictor variables to the response are assumed ceteris paribus), allowing the concrete to gain a lot of strength with large quantities of it, but this is not the case of the weakest concrete. Other relevant features are FineAggr (which almost do not strengthen the concrete when FineAggr > 760 \(km/m^3\)), Superplast (which its addition strengthens the concrete until there is about 12 kg in a \(m^3\) mixture) and Water, which shows a local minimum around 200 \(kg/m^3\), the weakest concrete’s concentration.

Strongest concrete’s ICE show similar, but displaced, plots. Moreover, this observation’s predictive variables are found in neighborhoods of local maxima, hence maximizing the strengthening factor of every component in the concrete mixture. Regarding the constant displacement with respect to the weakest observation, it is more significant in features with flat slopes, such as CoarseAggr, FlyAsh or Slag, which increase from around 10 to about 70. This shows the importance of assuming ceteris paribus, since the ICE plot of a predictive variable changes with respect to the observation’s values in the rest of variables.

e) ICE for ā€˜Age’ for each test instance + global PDP

age_rf <- model_profile(explainer_rf, variables="Age", N=100, type="partial")

plot(age_rf, geom = "profiles") +  
  ggtitle("Ceteris-paribus and partial-dependence profiles for Age") 

We can see how the previously described ICE plot of Age ceteris paribus is similar for all observations in the test dataset and, therefore, consistent with the PDP. That is, there is a steep increase from Age=0 to Age=80 and then, it remains almost constant. Between different observations, however, we notice a constant displacement. Then, the lowest observation is about 25 MPa weaker than the average (DPD) and the highest is about 20 MPa stronger.